Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new benchmark of six ambiguously-specified classification tasks. We evaluate humans and models on AmbiBench by seeing how well they identify the intended task using 1) instructions with varying degrees of ambiguity, and 2) different numbers of labeled examples. We find that the combination of model scaling (to 175B parameters) and training with human feedback data enables models to approach or exceed the accuracy of human participants across tasks, but that either one alone is not sufficient. In addition, we show how to dramatically improve the accuracy of language models trained without large-scale human feedback training by finetuning on a small number of ambiguous in-context examples, providing a promising direction for teaching models to generalize well in the face of ambiguity.
translated by 谷歌翻译
This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
translated by 谷歌翻译
A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize $\eta > 0$. This discretization leads the Langevin Algorithm to have a stationary distribution $\pi_{\eta}$ which differs from the stationary distribution $\pi$ of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of $\pi$ extend to $\pi_{\eta}$. In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for $\pi$, the analogous properties for $\pi_{\eta}$ are open questions with direct algorithmic implications. This note provides a first step in this direction by establishing concentration results for $\pi_{\eta}$ that mirror classical results for $\pi$. Specifically, we show that for any nontrivial stepsize $\eta > 0$, $\pi_{\eta}$ is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution $\pi_{\eta}$ without going through the continuous-time stationary distribution $\pi$ as an intermediary.
translated by 谷歌翻译
The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.
translated by 谷歌翻译
IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, healthcare, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and time-sharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).
translated by 谷歌翻译
This paper considers adaptive radar electronic counter-counter measures (ECCM) to mitigate ECM by an adversarial jammer. Our ECCM approach models the jammer-radar interaction as a Principal Agent Problem (PAP), a popular economics framework for interaction between two entities with an information imbalance. In our setup, the radar does not know the jammer's utility. Instead, the radar learns the jammer's utility adaptively over time using inverse reinforcement learning. The radar's adaptive ECCM objective is two-fold (1) maximize its utility by solving the PAP, and (2) estimate the jammer's utility by observing its response. Our adaptive ECCM scheme uses deep ideas from revealed preference in micro-economics and principal agent problem in contract theory. Our numerical results show that, over time, our adaptive ECCM both identifies and mitigates the jammer's utility.
translated by 谷歌翻译
Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised objectives encode task-relevant information. We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as general-purpose, training-only, supervisory signals. The SGC loss does away with explicit graph decoding and instead uses contrastive learning to align an agent's representation with a rich graphical encoding of its environment. The SGC loss is generally applicable, simple to implement, and encourages representations that encode objects' semantics, relationships, and history. Using the SGC loss, we attain significant gains on three embodied tasks: Object Navigation, Multi-Object Navigation, and Arm Point Navigation. Finally, we present studies and analyses which demonstrate the ability of our trained representation to encode semantic cues about the environment.
translated by 谷歌翻译
将视频视为一系列图像(框架),并重新使用Deep Neur网络模型,这是一种常见的做法,这些模型仅在视频上的图像上接受图像进行培训。在本文中,我们表明,这种信念的飞跃是,在图像上运作良好的深度学习模型也将在视频上效果很好。我们表明,即使摄像机正在查看没有以任何可察觉的方式变化的场景,并且我们控制了视频压缩和环境(照明)等外部因素,视频分析应用程序的准确性也会显着波动。发生这些波动是因为摄像机产生的连续帧可能在视觉上看起来相似,但是视频分析应用程序对这些帧的看法却大不相同。我们观察到这些波动的根本原因是摄像机自动进行的动态摄像头参数更改,以捕获和生成视觉上令人愉悦的视频。摄像机无意间充当无意的对手,因为如我们所示,连续帧中图像像素值的这些微小变化对从视频分析任务中重新使用图像训练的深度学习模型的见解的准确性产生了显着不利影响。为了从相机中解决这种无意的对抗效应,我们探讨了转移学习技术通过从图像分析任务中学习的知识转移来改善视频分析任务中的学习。特别是,我们表明,我们新训练的Yolov5模型在跨帧的对象检测中减少了波动,从而可以更好地跟踪对象(跟踪中的错误少40%)。我们的论文还提供了新的方向和技术,以减轻相机对用于视频分析应用程序的深度学习模型的对抗性影响。
translated by 谷歌翻译
在这项工作中,我们介绍了我们提出的方法,该方法是使用SWIN UNETR和基于U-NET的深神经网络体系结构从CT扫描中分割肺动脉的方法。六个型号,基于SWIN UNETR的三个型号以及基于3D U-NET的三个模型,使用加权平均值来制作最终的分割掩码。我们的团队通过这种方法获得了84.36%的多级骰子得分。我们的工作代码可在以下链接上提供:https://github.com/akansh12/parse2022。这项工作是Miccai Parse 2022挑战的一部分。
translated by 谷歌翻译
智能建筑中的室内热舒适对乘员的健康和表现有重大影响。因此,机器学习(ML)越来越多地用于解决与室内热舒适的挑战。热舒适感的时间变化是调节居住者福祉和能耗的重要问题。但是,在大多数基于ML的热舒适研究中,不考虑时间中的时间方面,例如一天中的时间,昼夜节律和室外温度。这项工作解决了这些问题。它研究了昼夜节律和室外温度对ML模型的预测准确性和分类性能的影响。数据是通过在14个教室中进行的长达一个月的实地实验收集的,其中512名小学生。四个热舒适度指标被认为是深神经网络的输出,并支持数据集的向量机模型。时间变异性对学童舒适性的影响通过“一天中的时间”分析显示。预测准确性的时间差异已显示(多达80%)。此外,我们表明室外温度(随时间变化)对热舒适模型的预测性能产生了积极影响高达30%。时空环境的重要性通过对比的是微观级别(特定于位置)和宏观级别(整个城市的6个位置)的重要性。这项工作的最重要发现是,对于多种热舒适度指标,显示了预测准确性的明确提高,而天空中的时间和天空照明则有所增加。
translated by 谷歌翻译